We present SpeechMatrix, a large-scale multilingual corpus of speech-to-speech translations mined from real speech of European Parliament recordings. It contains speech alignments in 136 language pairs with a total of 418 thousand hours of speech. To evaluate the quality of this parallel speech, we train bilingual speech-to-speech translation models on mined data only and establish extensive baseline results on EuroParl-ST, VoxPopuli and FLEURS test sets. Enabled by the multilinguality of SpeechMatrix, we also explore multilingual speech-to-speech translation, a topic which was addressed by few other works. We also demonstrate that model pre-training and sparse scaling using Mixture-of-Experts bring large gains to translation performance. The mined data and models are freely available.
translated by 谷歌翻译
轨迹预测和行为决策是自动驾驶汽车的两项重要任务,他们需要对环境环境有良好的了解;通过参考轨迹预测的输出,可以更好地做出行为决策。但是,大多数当前解决方案分别执行这两个任务。因此,提出了结合多个线索的联合神经网络,并将其命名为整体变压器,以预测轨迹并同时做出行为决策。为了更好地探索线索之间的内在关系,网络使用现有知识并采用三种注意力机制:稀疏的多头类型用于减少噪声影响,特征选择稀疏类型,可最佳地使用部分先验知识,并与Sigmoid多头激活类型,用于最佳使用后验知识。与其他轨迹预测模型相比,所提出的模型具有更好的综合性能和良好的解释性。感知噪声稳健性实验表明,所提出的模型具有良好的噪声稳健性。因此,结合多个提示的同时轨迹预测和行为决策可以降低计算成本并增强场景与代理之间的语义关系。
translated by 谷歌翻译
AMR到文本是NLP社区中旨在从抽象含义表示(AMR)图生成句子的关键技术之一。自2013年提出AMR以来,有关AMR到文本的研究越来越普遍,因为AMR作为自然语言的高级语义描述,由于AMR具有独特的优势,因此作为结构化数据的重要分支变得越来越普遍。在本文中,我们简要介绍了AMR到文本。首先,我们介绍了此技术的当前情况,并指出了它的困难。其次,根据先前研究中使用的方法,我们根据它们各自的机制将它们大致分为五个类别和预先训练的语言模型(PLM)。特别是,我们详细介绍了基于神经网络的方法,并介绍了AMR到文本的最新进展,该方法指的是AMR重建,解码器优化等。此外,我们介绍了AMR-TOXT的基准和评估方法。最终,我们提供了当前技术和未来研究的前景的摘要。
translated by 谷歌翻译
We present the OPEN GRAPH BENCHMARK (OGB), a diverse set of challenging and realistic benchmark datasets to facilitate scalable, robust, and reproducible graph machine learning (ML) research. OGB datasets are large-scale, encompass multiple important graph ML tasks, and cover a diverse range of domains, ranging from social and information networks to biological networks, molecular graphs, source code ASTs, and knowledge graphs. For each dataset, we provide a unified evaluation protocol using meaningful application-specific data splits and evaluation metrics. In addition to building the datasets, we also perform extensive benchmark experiments for each dataset. Our experiments suggest that OGB datasets present significant challenges of scalability to large-scale graphs and out-of-distribution generalization under realistic data splits, indicating fruitful opportunities for future research. Finally, OGB provides an automated end-to-end graph ML pipeline that simplifies and standardizes the process of graph data loading, experimental setup, and model evaluation. OGB will be regularly updated and welcomes inputs from the community. OGB datasets as well as data loaders, evaluation scripts, baseline code, and leaderboards are publicly available at https://ogb.stanford.edu.
translated by 谷歌翻译
Projection operations are a typical computation bottleneck in online learning. In this paper, we enable projection-free online learning within the framework of Online Convex Optimization with Memory (OCO-M) -- OCO-M captures how the history of decisions affects the current outcome by allowing the online learning loss functions to depend on both current and past decisions. Particularly, we introduce the first projection-free meta-base learning algorithm with memory that minimizes dynamic regret, i.e., that minimizes the suboptimality against any sequence of time-varying decisions. We are motivated by artificial intelligence applications where autonomous agents need to adapt to time-varying environments in real-time, accounting for how past decisions affect the present. Examples of such applications are: online control of dynamical systems; statistical arbitrage; and time series prediction. The algorithm builds on the Online Frank-Wolfe (OFW) and Hedge algorithms. We demonstrate how our algorithm can be applied to the online control of linear time-varying systems in the presence of unpredictable process noise. To this end, we develop the first controller with memory and bounded dynamic regret against any optimal time-varying linear feedback control policy. We validate our algorithm in simulated scenarios of online control of linear time-invariant systems.
translated by 谷歌翻译
Several self-supervised representation learning methods have been proposed for reinforcement learning (RL) with rich observations. For real-world applications of RL, recovering underlying latent states is crucial, particularly when sensory inputs contain irrelevant and exogenous information. In this work, we study how information bottlenecks can be used to construct latent states efficiently in the presence of task-irrelevant information. We propose architectures that utilize variational and discrete information bottlenecks, coined as RepDIB, to learn structured factorized representations. Exploiting the expressiveness bought by factorized representations, we introduce a simple, yet effective, bottleneck that can be integrated with any existing self-supervised objective for RL. We demonstrate this across several online and offline RL benchmarks, along with a real robot arm task, where we find that compressed representations with RepDIB can lead to strong performance improvements, as the learned bottlenecks help predict only the relevant state while ignoring irrelevant information.
translated by 谷歌翻译
The current optical communication systems minimize bit or symbol errors without considering the semantic meaning behind digital bits, thus transmitting a lot of unnecessary information. We propose and experimentally demonstrate a semantic optical fiber communication (SOFC) system. Instead of encoding information into bits for transmission, semantic information is extracted from the source using deep learning. The generated semantic symbols are then directly transmitted through an optical fiber. Compared with the bit-based structure, the SOFC system achieved higher information compression and a more stable performance, especially in the low received optical power regime, and enhanced the robustness against optical link impairments. This work introduces an intelligent optical communication system at the human analytical thinking level, which is a significant step toward a breakthrough in the current optical communication architecture.
translated by 谷歌翻译
Configurable software systems are employed in many important application domains. Understanding the performance of the systems under all configurations is critical to prevent potential performance issues caused by misconfiguration. However, as the number of configurations can be prohibitively large, it is not possible to measure the system performance under all configurations. Thus, a common approach is to build a prediction model from a limited measurement data to predict the performance of all configurations as scalar values. However, it has been pointed out that there are different sources of uncertainty coming from the data collection or the modeling process, which can make the scalar predictions not certainly accurate. To address this problem, we propose a Bayesian deep learning based method, namely BDLPerf, that can incorporate uncertainty into the prediction model. BDLPerf can provide both scalar predictions for configurations' performance and the corresponding confidence intervals of these scalar predictions. We also develop a novel uncertainty calibration technique to ensure the reliability of the confidence intervals generated by a Bayesian prediction model. Finally, we suggest an efficient hyperparameter tuning technique so as to train the prediction model within a reasonable amount of time whilst achieving high accuracy. Our experimental results on 10 real-world systems show that BDLPerf achieves higher accuracy than existing approaches, in both scalar performance prediction and confidence interval estimation.
translated by 谷歌翻译
A critical step in sharing semantic content online is to map the structural data source to a public domain ontology. This problem is denoted as the Relational-To-Ontology Mapping Problem (Rel2Onto). A huge effort and expertise are required for manually modeling the semantics of data. Therefore, an automatic approach for learning the semantics of a data source is desirable. Most of the existing work studies the semantic annotation of source attributes. However, although critical, the research for automatically inferring the relationships between attributes is very limited. In this paper, we propose a novel method for semantically annotating structured data sources using machine learning, graph matching and modified frequent subgraph mining to amend the candidate model. In our work, Knowledge graph is used as prior knowledge. Our evaluation shows that our approach outperforms two state-of-the-art solutions in tricky cases where only a few semantic models are known.
translated by 谷歌翻译
Along with the widespread use of face recognition systems, their vulnerability has become highlighted. While existing face anti-spoofing methods can be generalized between attack types, generic solutions are still challenging due to the diversity of spoof characteristics. Recently, the spoof trace disentanglement framework has shown great potential for coping with both seen and unseen spoof scenarios, but the performance is largely restricted by the single-modal input. This paper focuses on this issue and presents a multi-modal disentanglement model which targetedly learns polysemantic spoof traces for more accurate and robust generic attack detection. In particular, based on the adversarial learning mechanism, a two-stream disentangling network is designed to estimate spoof patterns from the RGB and depth inputs, respectively. In this case, it captures complementary spoofing clues inhering in different attacks. Furthermore, a fusion module is exploited, which recalibrates both representations at multiple stages to promote the disentanglement in each individual modality. It then performs cross-modality aggregation to deliver a more comprehensive spoof trace representation for prediction. Extensive evaluations are conducted on multiple benchmarks, demonstrating that learning polysemantic spoof traces favorably contributes to anti-spoofing with more perceptible and interpretable results.
translated by 谷歌翻译